Going with gawk has been a great diversion. It gave me a fresh perspective on what I needed to do for Storybot. I also had the chance to look at a bunch of other languages (being the language junkie I am) to find a suitable production replacement for the gawk scripts. Only to find...
While OCaml, Scheme (Gambit, Scheme48, Scsh, etc), Erlang, etc are great languages for building solid apps, they lack some of the libraries I need (stuff for FTP, SMTP, etc). Some of the languages had most of what I needed, but there was always something missing.
The point of EFX is not to be a language hero and Storybot is the least interesting part of EFX. EFX core is the most interesting part and its going to stay in Tcl. While something lisp-ish would be nice, the core needs a basic extension language and Tcl is still the best fit.
So... why not Tcl for Storybot too? I'm coming to that conclusion. I'm almost there...
Permalink | Tuesday, January 31 8:49 AM
Or, how to choose a programming language to rewrite Storybot.
With the prototype Storybot just about done, I am considering a rewrite. As the last entry mentions, this rewrite is intended to be monolithic. Why? A single Storybot executable will replace all sorts of unix dependencies: cron, pipes, bourne shell. I envision Storybot consisting of an executable and the StoryBot.cfg file. That would be nice (and portable). EFX would remain Tcl (either Tcl + Awk or Tcl + tcllib/TclXML).
I must admit that I tend to approach development of final solutions with much more engineering rigor than I care to admit. However, that rigor is often fed by theory (computer science). I want Storybot to be safe, fast and reliable. I don't believe that C/C++ and Java will get me there. Neither will a dynamic language such as Tcl, Perl, Python, Ruby or even Lisp/Scheme. I want to specify exactly how Storybot should behave and implement it as such -- with no allowance for runtime errors (I hope!).
For example, I eventually developed a POP3 client (in gawk) that does exactly what I want it to do. Many third party programs and libraries have quirks (defects, idiosyncracies in how they log errors, etc) that I don't like. I know exactly what I want my POP3 client to do -- no more, no less. The same for the mime parser, SMTP client, FTP client and other common utilities.
I don't want to conform to a third party library (and I certainly don't want it to be the deciding factor on what language I choose). I want full control over my resulting executable and I want it to be correct. And, I would like the programming language to force me to be explicit. This probably explains my year long infatuation with type systems. I am (re)looking at OCaml right now.
----
Permalink | Saturday, January 21 1:40 PM
This doesn't mean much regarding a release timeframe. It does mean, however, that the school site (storybot news portion) will go live content mgmt-wise. Not all of the Storybot stuff is there. But, its enough to go live.
I think I've reached a point where I am ready to choose a final implementation language for Storybot. The EFX core (and extension language) is still Tcl. I think it is version 1.0 solid. I have repeatedly rebuilt the school website and this blog with it and haven't encountered any major problems recently. But, Storybot in gawk was never meant to be a final solution.
Implementing Storybot in gawk has allowed me to play around with some ideas and eventually derive a Storybot/EFX protocol. This was the important part. But, does this mean a complete rewrite? Won't that take forever? And, what about the poor users of the alpha EFX? How will I handle support?
Well, the nice thing about the gawk implementation is that it is broken up into over a dozen individual apps (scripts). I can replace each of them one-by-one! But, with what? That answer later, but we may eventually see a monolithic storybot executable replace all of the scripts. The next question is how.
Right now, there is still the missing SMTP client piece. I was going to do this in gawk or just use Tcl's tcllib mime package, but this may be the first app to tackle using the monolithic executable. After that, I think I'll start from the left of the pipeline and consume one script at a time (pop3fetch | maketicket | headers ..) into the monolith. That way, I can continue to support the currently deployed EFX and work toward a new version.
Permalink | Thursday, January 19 3:38 PM
The double-up bug appearing in the two previous posts have been fixed. It was introduced by the new script that handles deleting stories (don't ask).
This bug tells me that there is refactoring to do. I need to clean up the current code before diving into the full blown email-based managemen code. The delete capability was a hack added to support a remote capability: to allow me to delete bad content posts from the school's website. Next, I need to fully design and implement the real delete, edit, confirmation response, etc email handler.
But, before I start that grand endeavour, I need to clean up the current. Gee, maybe I can start working on releasing the code too (its too ugly to look at now -- you'll turn to stone).
Permalink | Wednesday, January 18 3:57 PM
The last post got doubled. Argh. Don't you hate finding bugs in server-side code when you don't have (immediate) access to the server to fix it?
The last post got doubled. Argh. Don't you hate finding bugs in server-side code when you don't have (immediate) access to the server to fix it?
Permalink | Tuesday, January 17 9:54 AM
EFX has an unusual architecture. I haven't released the code yet, but if you were to see how it was arranged, it might surprise you.
EFX is composed of Tcl and gawk scripts tied together with a unix shell (bash/bourne). Because of this, you need a unix system (linux, solaris, Mac-OSX, *bsd) or cygwin (unix tools for MS windows) to run the software. Actually, here is the complete (current) list of tools/commands that are used:
Now, here is the unusual bit: EFX as driven by Storybot (the content driver -- it downloads email and kicks off processing) is composed from a unix pipeline of about 15 co-processes.
Huh?
Well, its actually quite simple. Here is the Storybot command:
pop3fetch | maketicket | headers | verify | handledeletes | mimesplit | decode64 | decodequotedp | fiximgfile | choosebodyformat | formathtml | formattext | genstory | submitstory | cleanup
Pop3fetch downloads each mailpiece and hands it off to maketicket which then... By the time we make it to submitstory, we are actually invoking the EFX engine to (re)build the site (Which conveniently only happens when we have downloaded all email and pop3fetch exits!). Each of these commands is a small gawk script.
But, why? For heaven's sake, why?
Its actually pretty efficient. Unix was built for co-processes communicating via pipes. Each command does a very specific thing (well, I still need to refactor submitstory since it does 3 things: submits a story, runs EFX and publishes the website!). But, the main benefit to doing all of this small script pipelining is this: I can debug/rewrite/replace any command at anytime. If I want to drop in a C, Tcl or Perl replacement for any command, all the replacement has to do is adhere to the protocol between the commands.
Another benefit: debugging. Remove the cleanup command from the pipeline, re-run and perform a postmortem on what was done. Or, want to see how the protocol is used between commands? Here is an example that captures the result of headers and sends it to a file (while still maintaining the pipeline).
pop3fetch | maketicket | headers | tee headers.out | verify ..
IMHO, this is pretty elegant. But, then again, I'm pretty weird.
EFX has an unusual architecture. I haven't released the code yet, but if you were to see how it was arranged, it might surprise you.
EFX is composed of Tcl and gawk scripts tied together with a unix shell (bash/bourne). Because of this, you need a unix system (linux, solaris, Mac-OSX, *bsd) or cygwin (unix tools for MS windows) to run the software. Actually, here is the complete (current) list of tools/commands that are used:
Now, here is the unusual bit: EFX as driven by Storybot (the content driver -- it downloads email and kicks off processing) is composed from a unix pipeline of about 15 co-processes.
Huh?
Well, its actually quite simple. Here is the Storybot command:
pop3fetch | maketicket | headers | verify | handledeletes | mimesplit | decode64 | decodequotedp | fiximgfile | choosebodyformat | formathtml | formattext | genstory | submitstory | cleanup
Pop3fetch downloads each mailpiece and hands it off to maketicket which then... By the time we make it to submitstory, we are actually invoking the EFX engine to (re)build the site (Which conveniently only happens when we have downloaded all email and pop3fetch exits!). Each of these commands is a small gawk script.
But, why? For heaven's sake, why?
Its actually pretty efficient. Unix was built for co-processes communicating via pipes. Each command does a very specific thing (well, I still need to refactor submitstory since it does 3 things: submits a story, runs EFX and publishes the website!). But, the main benefit to doing all of this small script pipelining is this: I can debug/rewrite/replace any command at anytime. If I want to drop in a C, Tcl or Perl replacement for any command, all the replacement has to do is adhere to the protocol between the commands.
Another benefit: debugging. Remove the cleanup command from the pipeline, re-run and perform a postmortem on what was done. Or, want to see how the protocol is used between commands? Here is an example that captures the result of headers and sends it to a file (while still maintaining the pipeline).
pop3fetch | maketicket | headers | tee headers.out | verify ..
IMHO, this is pretty elegant. But, then again, I'm pretty weird.
Permalink | Tuesday, January 17 9:17 AM
On the other hand...
What if I just require that each story submission (via email) include a special key (password) and just do auto-approval and instant publishing?
Subject: agEdsX45sad Security at the expense of simplicity?
The user would be required to memorize (nah!) or cut and paste a fairly abstract password. This should dramatically reduce the occurrence of email forgery. You may forge the "From/Return-Path", but you would also have to guess at a password.
Given spammer profiles, a spammer/attacker is more likely to be someone who has snooped out a submitters email address (not broken into their account and bother to review "sent" email for password hints). Sending this password in plain text should not be a problem. You would have to be truly targeted to start worrying about the hoops an attacker would have to go through to illegally post a story to the website!
Rather than call this a "password", it is probably best called a poster ID or poster key....
Permalink | Saturday, January 14 3:40 PM
The more I think about implementing Storybot security by having an editor approve each submitted story by email, the more I dislike it. It just feels too cumbersome. I can see an editor having to sludge through a mailbox full of story submittals and needing to hit reply for each and every one. Ugh.
Here is an alternative strategy: Every submitted story is published to a preview (staging) site and each submitter (and the editor) is sent an email receipt. This allows the submitter (upon receipt) to view their story within the context of the website (but at a "private" preview host) and reply only if they want to delete it.
But how (and when) does the real site get updated? How about this: The editor is emailed a cryptographically generated key (a really long password). The key appears in the subject line. This email is saved. Everytime the editor wishes to publish the preview website to the real one, she simply replies on the email.
This has several benefits. First, they key is pretty much unforgable (32+ bytes?). Second, since it is in email, the key never has to be typed in. Third, we still make sure the key comes from the editor account. This sounds, on the surface at least, reasonably secure.
Permalink | Saturday, January 14 2:24 PM
Storybot is probably 80% done. And, as we all know from experience, the last few percentages are the hardest and longest.
The core email content submittal stuff appears to work well. I have not stressed it and I am sure there will be email it receives that it will have difficulty handling, but it is usable.
What's left? Well, the "approval" capability still isn't done. Currently storybot is a black hole. There is no capability to neither respond with a confirmation/denial to the submitter nor forward the submission to an editor for approval. For this I will need a little logic and a lot of SMTP.
I am holding back a bit on this to see if I can at least get a basic (minimal) storybot running under EFX in a live environment. I fear that the whole content approval (and management: edit,resubmit,delete,etc) will grow and grow in complexity. This is why I am taking a step back and thinking deep(er) thoughts.
Do I want to do this in gawk? Should I go back to Tcl? Is this where Perl re-inserts itself? These are important questions for me. You see, I chose Tcl as the core extension/scripting language of EFX. The core was all Tcl, but now mostly Tcl with some gawk for the XML. All of the page generation stuff is Tcl based and that will not change (I still think that Tcl makes for a beautiful extension language for EFX). The core EFX engine may change (I do a poor man's version of closures using Tcl's namespaces that begs to be done in a language with support for real closures (hint: each page is a closure -- each page captures variables and has its own playground without affecting other pages, but with each page generated from that one capturing the current variables in its own closure -- whew).
All of this is starting to suggest to me that I am stretching gawk (and Tcl) beyond reason. Maybe I need to start thinking of this as a prototype system and start considering a more robust language for the infrastructure.
Something like: Tcl as EFX extension language + Scheme for EFX core and Storybot?
Deep thoughts...
Permalink | Friday, January 13 11:44 AM
Email-based CMS is (IMHO) a neat idea. That you don't have to access a web form to post content makes content maangement soooo much easier. There are a few nits (e.g. email is not as immediate, deleting and editing already posted content will be a bit cumbersome), but generally it makes content posting simple.
Unfortunately, there is one main problem that rears its ugly head: security.
Since email addresses (From/Reply-To) can be forged, the plan is to have every submitted piece of content be forwarded to a reviewer/editor for approval (the reviewer simply "replies" to Storybot with the story ID in the subject line). Any email received by Storybot that is a "reply" and has a valid story ID will re-insert that story back into the pipeline for publishing. This may get cumbersome...
Permalink | Wednesday, January 11 9:14 AM
If this makes it through, it will be the first email submitted item on this blog.
If this didn't make it through... you haven't seen this. Got it?
Permalink | Tuesday, January 10 7:47 PM
As I related in the last post, I wanted to reduce the Tcl library dependencies. The XML parsing in core EFX is being done with a simple awk script I found in a newsgroup. This removed the expat/TclXML dependency.
Then I hacked together an ftp client in gawk. This removes the final tcllib dependency. EFX now only requires a simple clean build of Tcl (8.3?) and the tclsh binary. This should work with stock Tcl compiles.
Yay. Now I can start concentrating on fixing nits (such as the fact that the core chokes on non-bare dollar signs as if they were variables!). Ugh.
Enough for tonight.
Permalink | Monday, January 09 10:31 PM
With the decision to do storybot in gawk, I may appear to be moving away from a Tcl-based EFX system. This is not necessarily true. Tcl is still very much at the heart of EFX. It is used for the core template engine as well as the template programming language (for content management apps like the navigational menu generator, story publisher and calendar).
I went with gawk for driving mechanism behind storybot and possibly for other things that Tcl is being used for such as the XML parser and FTP publishing system. Why? Tcl is perfectly suited for these tasks, but currently requires more baggage (expat and tcllib) than I had wanted. Ideally, I could use the stock Tcl provided by whatever distribution (Linux, Freebsd, cygwin, etc) and not require any special software installed by the user. Gawk is available for a lot of systems as the default awk. Where it is not (Solaris, BSD?), I would just require a simple gawk installation. By using more than the Tcl core, I would require an additional installation of tcllib and a compile of expat (for the Tcl XML parser).
Gawk comes with other reasons for me too. It is dead simple and forced me to consider structuring the system as a lot of simple processes connected via pipes. This is perfect for a CMS workflow system. Using Unix pipes also reduces the dependency on language (you can replace the awk apps with other implementations at any step). State is only carried by the tickets passed between the apps via the pipes.
Permalink | Sunday, January 08 8:45 AM
Look, ma! Permalinks.... finally.
To do (reminders for me):
Permalink | Wednesday, January 04 9:45 AM